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CN ' Let Xi, . . . , Xn be independent with mean zero, finite variances cr^,...,cr^ and 

finite absolute tfiird moments, Fn the distribution function of {Xi + . . . + Xn)/cr where 
I = Y17=i '^i' ^ that of the standard normal. Then the distance between Fn 
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In particular, when Xi, . . . , Xn are identically distributed with variance a 
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corresponding to an Berry Esseen constant of 1. A lower bound of 
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on the smallest possible constant is provided. 



^ : 1 Introduction 



The classical central limit allows the approximation of the distribution of sums of 'com- 
parable' independent real valued random variables by the normal. As this theorem is an 
asymptotic, it provides no information as to whether the resulting approximation is useful. 
For that purpose one may turn to the Berry-Esseen theorem, the most classical version giving 
supremum norm bounds between the distribution function of the normalized sum and that 
of the standard normal. Various authors have also considered Berry-Esseen type bounds 
using other metrics, and in particular bounds in L^. The case p = 1, where the value 



/oo 
\F{x) - G{x)\dx 
-oo 
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is used to measure the distance between distribution functions F and G, is of some particular 
interest, and results using this metric are known as mean central limit theorems, see, for 
instance, [12], [4], [H] and [1]; the latter three of these works consider nonindependent 
summand variables. One motivation for studying bounds is that combined with one of 
type L°°, bounds on U distance for all p G (1, oo) may be obtained by the inequahty 

\\F-G\\l<\\F~G\\^-'\\F-G\\,. 

For a G (0, oo) let JF^ be the collection of distributions with mean zero, variance a^, and 
finite absolute third moment. We prove the following Berry Esseen type result for the mean 
central limit theorem. 

Theorem 1.1 For n G N let Xi, . . . ,X„ he independent mean zero random variables with 
distributions Gi G J^o-^, ...,(?„ G J-'^,^, and let Fn be the distribution of 
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W = -y"Xi wh 
0" ^ 
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i=l 



Then 



In particular, when Xi 



1 " 

\Fn-n.<-,Y.E\x.? 

1=1 



,Xn are identically distributed with distribution G E J-'i, 



\Fn-<^\\l < 



E\Xi\ 



for all n E N. 



For the case where all variables are identically distributed as X having distribution G, 
letting 



inf {C : 





\Fn 


- $1 
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E\X\ 
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< G for all G E Ti and n > m}. 



(1) 



the second part of Theorem II. II can be restated as the upper bound ci < 1. We also provide 
the following lower bound 



Theorem 1.2 With ci given by (J\) for m = 1, 

^ 2^(2^1) - 1) - (0F + v^) + 2e~'/^V2 



0.535377... 



TT 



(2) 



Clearly the elements of the sequence {cm}m>i are nonnegative and decreasing in m, and 
therefore has a limit, say Coo- Regarding limiting behavior Esseen [3] showed that 

lim n^/^||F„ - $||i = A{G) 

n— >oo 

for an explicit constant A{G) depending only on G. Zolotarev [18] provides the representation 



A{G) 



1/2 poo 



1 



0\/2tX J -1/2 J - 



l-x^) + hu\e ''^''^dxdu 



(3) 



where uj = \EX^\/{3a'^) and h is the span of the distribution G in case G is lattice, and zero 
otherwise. Zolotarev obtains 



sup 



a^AjG) _ 1 



showing Coo = 1/2; giving the asymptotic Berry Esseen constant value. 

Here the focus is on nonasymptotic constants, and in particular on the constant ci which 
gives a bound for all n G N. Theorem 1 1.1 1 is shown using Stein's method (see [15], [T7]) which 
uses the characterizing equation ([5]) for the normal, and an associated differential equation 
to obtain bounds on the normal approximation. More particularly, we employ the zero bias 
transformation, introduced in [U], and the evaluation of a Stein functional, as in [13]. In [H] 
it was shown that for all X with mean zero and finite non-zero variance there exists a 
unique distribution for a random variable X* such that 



for all absolutely continuous functions / for which these expectations exist. The zero bias 
transformation, mapping the distribution of X to that of X*, was motivated by the Stein 
characterization of the normal distribution [16], which states that Z is normal with mean 
zero and variance cx^ if and only if 



for all absolutely continuous functions / for which these expectations exist. Hence, the mean 
zero normal with variance cx^ is the unique fixed point of the zero bias transformation. How 
closeness to normality may be measured by the closeness of a distribution to its transform, 
and applications, are the topics of [5] and [6]. 

As shown in [9] and [7], for a random variable X with EX = and Var(X) = o"^, the 
distribution of X* is absolutely continuous with density and distribution functions given by, 
respectively. 



Theorem 11.11 results by showing that the functional 



B{G) = 



2a^\ 


\G* - 


-G\ 
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E\X\ 
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(7) 



is bounded by 1 for all X with distribution G & T^- As in ([3]) one may write out a more 
'explicit' form for B{G) using ([6]) and expressions for the moments on which B{G) depends, 
however such expressions appear to be of little value for the purposes of proving Theorem ll.il 
In turn, the proof here employs convexity properties of B{G) which depend on the behavior 
of the zero bias transformation on mixtures. We note also that the functional B{G) is 
somewhat different from A{G)\ for instance, A{G) is zero for all nonlattice distributions 
with vanishing third moment, whereas B{G) is zero only for mean zero normal distributions. 
Parallels to the current work appear in [13] where a different type of Stein functional was 
studied using somewhat similar methods, see in particular Proposition 4.1 there. 



a^EfiX*) = E[XfiX)] 



(4) 



a'EfiZ) = E[Zf{Z)] 



(5) 



g*{x) = (r-^E[Xl{X > x)] and G*{x) = (t-^E[X{X - x)l{X < x)]. 



(6) 
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Let C{X) denote the distribution of a random variable X. Since the distance scales, 
that is, since for all a G M 



\\C{aX) - CiaY)\U = \a\\\C{X) - CiY)\U, (8) 

by replacing af by crfja"^ and \\G* — Gi\\i by \\G* — Gi\\i/a in equation (16) of Theorem 2.1 
of [7j we obtain 

Proposition 1.1 Under the hypotheses of Theorem \l.l[ 

\\Fn-ni < ^^Y.B{GmM''- 

i=l 

For a collection of mean zero distributions with finite absolute third moments let 

B{J^) = sup 5(G). 

Clearly, Theorem 11.11 follows immediately from Proposition 11.11 and the following result. 
Lemma 1.1 For all a G (0, oo), 

BiJ^^) = 1. 

The equality to 1 in Lemma 11.11 improves the upper bound of 3 shown in [7] . Though our 
interest here is in best universal constants, we note that Proposition II. II provides B{G) as a 
distribution specific Berry-Esseen constant in that 

5(^)^1X113 
\\Fn - $ 1 < ' forallriGN 



when Xi, . . . ,Xn are identically distributed according to G G J-'a- For instance B{G) = 1/3 
when G is a mean zero uniform distribution, and B{G) = 1 when G is a mean zero two point 
distribution, see Corollary 2.1 of [7], and Lemmas 11.21 and 11.31 below. 

We close this section with two preliminaries. The first collects some facts shown in [7], 
and the second demonstrates that to prove Lemma 11.11 it suffices to consider the class of 
random variables J-'i. Then, following Hoeffding [10] (see also [13]) in Section [2] we use 
a continuity property of B{G) to show that its supremum over J-'i is attained on finitely 
supported distributions. Exploiting a convexity type property of the zero bias transformation 
on mixtures over distributions having equal variances we reduce the calculation further to 
the calculation of the supremum over D^, the collection of all mean zero distributions with 
variance 1, supported on at most three points. As three point distributions are in general 
a mixture of two two point distributions with unequal variances, an additional argument 
is given in Section [3] where a coupling of an X with distribution G G -D3 to a variable X* 
having the X zero bias distribution is constructed, using the optimal couplings on the 
component two point distributions of which G is the mixture, in order to obtain B{G) < 1 
for all G E D^. Theorem ll.2| the lower bound on ci, is calculated in Section |H 

The following simple formula will be of some use. For /, a and b nonnegative, 

, ^ ,u . ^ I 0? + IP' _ , 

[a + b)— — a\au = — . (9) 

^ ' I ^ 2 a + b ^ ^ 
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Lemma 1.2 Let G be the distribution of a nontrivial mean zero random variable X supported 
on the two points x <y. Then X* is uniformly distributed on [x,y], 



2 771^^3, -xyiy^ + X^) ur^v*^ r^vMi 



EX' = -xy, E\X^\ = — ^^^^ and \\C{X*) - C{X)\\i 

y — X 2 y — X 

In particular B{G) = 1 and 

BiJ^i) > 1. 

Proof: Being nontrivial G has positive variance, and from ([6]) we see that the density g* of 
G*, which is proportional to E[X1{X > x)], is zero outside [x,y] and constant within it, so 
G*{w) = (w — x)/ (y—x) for w G [x, y]. That G has mean zero implies that the support points 
x and y satisfy x < < y and that G gives positive probability y/{y — x) and —x/{y — x) 
to X and y respectively. The moment identities are immediate. 

Making the change of variable u = w — x and applying (Q with a = y/{y — x),b = 
—xj {y — x) and I = y — x yields 

\\r(v*\ riv\\\ f\w-x y 1 f y"^ + x'^ 

\\C{X ) ~ C{X)\\i = I \- — --- — -\dw 

and ([7j) now gives B{G) = 1. 



y — X y — X 2 \ y — X 



Lemma 1.3 Let G E for some a G (0, oo), let X have distribution G, and for a ^ let 
Ga denote the distribution of aX . Then B{Ga) = B{G) and in particular 

B{ra) = B{ri) for all a G (0, oo). 

Proof: That aX* has the same distribution as {aX)* follows from (j4]). Now the identities 
alx = E\aX\^ = \a\^E\X^\ and ([8]) imply the first claim. Since 

{B{G) : G G j;} = {B{G) : G G J^i}, 

taking supremum completes the proof. ■ 



2 Reduction to three point distributions 

Let {S, S) be a measurable space, and let {ms}s£S be a collection of probability measures 
on R such that for each Borel subset A C M the function from S* to [0, 1] given by 

s ms{A) 

is measurable. When is a probability measure on [S, E), the set function given by 

m^(A) = / ms{A)jj,{ds) 
Js 

is a probability measure, and called the /i mixture of {ms}s^s- With some slight abuse of 
notation, we let E'^ and Eg denote expectations with respect to and and let and 
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Xg be random variables with distributions and m^, respectively. For instance, for all 
functions / which are integrable with respect to yU we have 

E^f{X) = j Esf{X)fi{ds) which we also write as Ef{X^) = j Ef{X,)fi{ds). 

In particular, if {ms}ses is a collection of mean zero distributions with variances = EX^ 
and absolute third moments 7^ = £'^,1X^1, the mixture distribution has variance cx^ and 
third absolute moment 7^ given by 

Js Js 

where both may be infinite. Note that cr^ < 00 implies erf < 00 /x-almost surely, and therefore 
that m*, the zero bias distribution, exists /i-almost surely. 

Theorem 12.11 shows that the zero bias distribution of a mixture is a mixture of zero 
bias distributions with mixing measure the original measure weighted by the variance and 
rescaled. Define (arbitrarily, see Remark 12. ip the zero bias distribution of 60, a point mass 
at zero, to be Sq. Write X =iiY when X and Y have the same distribution. 

Theorem 2.1 Let {ms,s & S} be a collection of mean zero distributions on M and fi a 
probability measure on S such that the variance of the mixture distribution is positive and 
finite. Then m* , the zero bias distribution exists and is given by the mixture 

* /" * 7 1 dv al 
m,, = / mMiy where — = — . 
^ J dfi al 

In particular, u = fi if and only if cr^ is a constant fi a.s. 

Proof: The distribution m* exists as m,, has mean zero and finite nonzero variance. Let 
X* have the zero bias distribution, and let Y have distribution m*. For any absolutely 
continuous function / for which the expectations below exist, 

alEf'iX;) = EX,f{X,) 

EXJ{X,)dfi 



alEf'{X:)d^i 

al j Ef{X:)du 
<Ef{Y). 



Since Ef'{X;) = Ef{Y) for all such / we conclude =d Y. ■ 
Remark 2.1 If nis = So for any s G S* then al = 0, and therefore 

E S : nig = 60} = 0. 

Hence the mixture X* gives zero weight to m* for all such s, showing that (Sq)* may be 
defined arbitrarily. 
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We now recall an equivalent form of the distance involving expectations of Lipscliitz 
functions L on M, 

\\F-G\U = snp\Ef{X)-Ef{Y)\ where L = {f : \f{x) - f{y)\ < \x ~ y\}, (10) 

and X and Y have distribution F and G, respectively. With a slight abuses of notation we 
may write B{X) in place of B{G) when X has distribution G. 

Theorem 2.2 Let X^ he the fi mixture of a collection {X^, s E S} of mean zero, variance 
1 random variables satisfying E\X^\ < oo. Then 

B{X^)<snpB{Xs). (11) 

s£S 

In particular, if C is a collection of mean zero, variance 1 random variables with finite 
absolute third moments and V C C such that every distribution in C can be represented as a 
mixture of distributions in V, then 

B{C)=B{V). (12) 

Proof: Since the variances cr^ of Xg are constant the distribution X* is the /i mixture of 
{X*, s G 5} by Theorem 12 .11 Hence, applying (fTOl) . 

\\C{X;)-C{X,)\\, = snp\Ef{X;)-Ef{X,)\ 



= sup I / Ef{X:)dii - / Ef{X,)dii\ 

fdL Js JS 

< sup / \EfiX:)-Ef{Xs)\df^ 

fdL Js 



< sup / \\CiX:) - C{X,)\\,dfi 
feL Js 



[ \\CiX:) - CiXMidfi. (13) 
Js 



Now let r be the measure on {S, S) which is absolutely continuous with respect to with 
Radon Nikodym derivative 

dr E\Xf\ 



dfi E\Xl\- 



(14) 



This relation defines a probability measure as E\Xl\ = J^E\Xl\dfi. Noting also that 
Var(X^) = J^EX^dfi = 1, applying ([13]) we find 



2| 











< 



J,2\\C{X:)-C{X,)\Udi, 

J^B{X,)E\X^\dfi 
E\X',\ 

BiXs)dT 

< sup5(X,), (15) 

ses 



proving f|TT]) . 

Regarding f|T2|) . clearly B{T>) < B{C), and the reverse inequality follows from f|TT]) . ■ 

Remark 2.2 T/ie supremum over S in 173]) . and therefore in the theorem, can be replaced 
with essential supremum, with respect to t in p^ , over S. 

Note that no bound of the type provided by Theorem 12.21 holds in general when taking 
mixtures of variables that have unequal variances. In particular, if Xg ~ A/'(0, 0"^) and 
is not constant in s, then is a mixture of normals with unequal variances, which is not 
normal. Hence, in this case B{X^) > 0, whereas B{Xs) = for all s. 

To apply Theorem 12. 21 to reduce the computation of -B(J^i) to finitely supported distribu- 
tions we apply the following continuity property of the zero bias transformation, see Lemma 
5.2 in [H]. We write X„ =^ X for the convergence of X^ to X in distribution. 

Lemma 2.1 Let X and X„, n = 1,2, . . . be mean zero random variables with finite, nonzero 
variances. If 

Xn X and lim EX^ = EX^, 

n— >oo 

then 

n ^ ■ 

For a distribution function E let 

E-\w) = sup{a : E{a) < w} for all w G (0, 1). (16) 

If U is uniform on [0, 1] then E^^{U) has distribution function E, and if X„ and X have 
distribution functions F„ and E respectively and X^ =^ X then E~^{U) E^^{U) a.s (see, 
e.g.. Theorem 2.1 of [2]). For distribution functions E and G, 

IIF-Glli = inf^|X-r| (17) 

where the infimum is over all joint distributions on X, Y which have marginals E and G 
respectively, and the variables E~^{U) and G~^{U) achieve the minimal coupling, that 
is, 

\\E-G\\i = E\E-\U)-G-^{U)\, (18) 

see [H] for details. 

With the use of Lemma 12.11 we are able to prove the following continuity property of the 
functional B{X). 

Lemma 2.2 Let X and Xn, n E N be mean zero random variables with finite, nonzero 
absolute third moments. If 

Xn X, lim EXl = EX^ and E\X'^\ ^ E\X^\ (19) 

n^oo 

then 

B{Xn) B{X) asn^oo. 
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Proof: By Lemma [2.11 we have X* =^ X*. Let U he a uniformly distributed variable and 
set 

= iF^\u),F^'ju),F^}iu),F^liu)) 

where Fx denotes the distribution function of Fx, and so forth. Then (Y,Yn,Y* ,Y*) =d 
{X,Xn,X*,X:), Yn -^a.s. Y and Y: ^a.s. Y* and by m 



\\C{X„)-C{X)\\, = E\Y:-Y^\ and \\C{X*) - C{X)\\ = E\Y* - Y\. 
By (jll) with f{x) = x^sgn(a;) we find, for Y for example, that 

E\Y^\ = 2V&t{Y)E\Y*\. 
Hence EY^^^ = EXl EX'^ = EY"^ 

E\Y*\-^-^-.^-^-E\Y*\ asn^oo 
£.|r„| - - 2EX^ 2EX^ ~ 2EY^ ~ ' ' 

Hence {Yn}nm ^-nd are uniformly integrable, so {Y* — Yn}nen is uniformly inte- 

grate. As Y* - Yn ^a.s. Y* -Y as n ^ oo, 

hm \\C{X^) - CiX)\\, = hm E\Y: - Y^\ = E\Y* -Y\ = \\CiX*) - C{X)\\. (20) 

n— »oo n— >oo 

Combining (!20!) with the convergence of the variances and the absolute third moments as 
provided by ( fTOl) the proof is complete. ■ 
Lemmas 12.31 and 12.41 borrow much from Theorem 2.1 of [10], the latter lemma indeed 
being implicit. The results of [ID] are not applied directly as B{G) is not expressed as the 
expectation of K{X) for some K when C{X) = G. For m > 2 let Dm denote the collection 
of all mean zero, variance 1 distributions which are supported on at most m points. 

Lemma 2.3 

B{J^^) = B{[j Dm). 

m>3 

Proof: Letting Ai be the collection of distributions in J^i which have compact support we 
first show that 

B{J^i) < B{M). (21) 

Let C{X) e Tx be given and for G N set Y^ — Ari|x|<n- Clearly Y^ -As < oo 

and \Y^\ < for all p > 0, by the dominated convergence theorem 

EYn ^ EX = 0, EY^ ^ EX^ = 1 and E\Y^\ E\X^\ as n ^ oo. (22) 

Letting 

X„ = r„ - EYn (23) 

we have X„ ^ X by Slutsky's theorem, so, in view of (1221) the hypotheses of Lemma fl?2\ are 
satisfied, yielding 

B{Xn) B{X) as — > oo, with {X„}„gN C A4, 
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showing fl2T|) . 

Now consider C{X) G Ai, so that |X| < M a.s. for some M > 0. For each n G N let 



2" 



Since |X| < M a.s., each y„ is supported on finitely many points and \Yn\ < 2M for all n 
sufficiently large. Clearly Yn —>■ X a.s, and fl22l) holds by the bounded convergence theorem. 
Now defining X„ by fl23l) the hypotheses of Lemma 12.21 are satisfied, yielding 

B{Xn) B{X) as n ^ oo, with {Xn}nen C U„>3 Dm- 

showing B{^A) < B{\j^.^^Dm)- Combining this inequality with fl^ yields -B(J^i) < 
-^(Um>3-^m) ^'^'^ therefore the lemma, the reverse inequality being obvious. ■ 

Lemma 2.4 Every distribution in IJm>3-^m ^'^'^ expressed as a finite mixture of 
distributions. 

Proof: The lemma is trivially true for m = 3 so consider m > 3 and assume that the lemma 
holds for all integers from 3 to m — 1. 

The distribution of any X G Dm is determined by the supporting values ai < ■ ■ ■ < 
and a vector of probabilities p = (pi, . . . ,Pm)'- If any of the components of p are zero then 
X G Dk for k < m and the induction would be finished, so assume all components of p are 
strictly positive. As X G Dm the vector p must satisfy 



Ap = c where A 



ai 02 



and c 



1 

1 



Since A G M^""™ with m > 3, Af{A) ^ {0}, that is, there exists v 7^ with 

Ay = 0. 



(24) 



Since v 7^ and the equation specified by the first row of A is exactly that J2i "^i = 0) 
the vector v contains both positive and negative numbers. Since the vector p has strictly 
positive components, the numbers ti and ^2 given by 

ti = inf{t > : mm{pi + tvi) > 0} and t2 = inf{t > : min(pj — tvi) > 0} 

i i 

are both strictly positive. Note that 

Pi = p + tiv and P2 = P - 

satisfy 

Api = A{p + tjv) = Ap = c for z G {1, 2} 

by (12^ . so that pi and p2 are probability vectors, as their components are nonnegative and 
sum to one. Additionally, the the corresponding distribution have mean zero and variance 
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1, and in each of these two vectors at least one component has been set to zero. Hence we 
may express the m point probabihty vector p as the mixture 

of probabihty vectors on at most m — 1 support points, thus showing X to be the mixture 
of two distributions in Dm-i, completing the induction. ■ 
The following theorem is an immediate consequence of Theorem 12.21 and Lemmas 12.31 and 

Theorem 2.3 

= B{Ds). 

Hence we now restrict attention to D3. 

3 Bound for D^, distributions 

Clearly, Lemma ILII follows from Lemma IL3[ Theorem 12.31 and Theorem 13.11 below, which 
shows B{D^) = 1. We prove Theorem 13.11 with the help of the following result. 

Lemma 3.1 Let x < y < < z and let mi and rriQ be the unique mean zero distributions 
with support {x, z} and {y, z} respectively, that is, 

^ ifw = x ( ^ ifw = y 

mi{{w}) = { ifw = z and mo{{w}) = I ^ if w = z 

otherwise I otherwise. 



Then 



Proof: By Lemma 11.21 



— mo| |i < I — mil |i. (25) 



z"^ + x"^ {z^ + x'^){z — yY z"^ — 2yz^ + y^z"^ + x'^z'^ — 2x'^yz + x'^y'^ 



m-i — mi 1 = — = — ; —r 



2{z - x) 2{z-x){z-yY 2{z - x){z - y)' 

[z"^ — 2yz^ + x^z^ — 2x'^yz) + y'^z'^ + x^y^ 
2{z — x){z — yY 



(26) 



Let Fi, Fq, and Fq denote the distribution functions of mi, mo, m\ and respectively. 
By Lemma [L2] m^ and mg are uniform over [x^z] and respectively. Letting Ji = [x,y) 

and J2 = [y, z] we have 

I |m^ — mol |i = /i + /2 where h= \Fi{w) — Fo{w)\ dw for {1,2}. 

J J, 

Since F*{w) > = Fo{w) for all w e Ji, 

r ( ^ 1 {y-xf ^ {y-xf{z-yY ^27) 
\z — xj 2 z — x 2{z — x){z — yY 

11 



The calculation of I2 depends on the relative magnitudes of F^{y) = [y — x)/{z — x) and 
Foiy) = z/{z — y). We note that 

Fi{y)<Fo{y) if and only if y{x + z) < y'^ + z'^ . (28) 

When F^{y) < Fo(y) the quantities a = -^ — jE^, b = —73^ and / = z—y are all nonnegative, 
so applying applying (Q after the change of variable u = w — y yields 

\W-x z 1^^^^ _ ( z-y\ y~ ~ ~ ) + y~) 

y—x 
z—x 



Z—y z—x z—y 
^ {.z{z -x)-{y- x){z - y)f + {y{z - x)f 
2{z — x){z — yY 

_ (y^ + z^){z - xY - 2z{z - x){y - x){z - y) + {y - xY{z - yY 

2{z — x){z — yY 

Adding ^ io ^ yields 



^2 = / I \dw . „ , ^ 

y z — X Z — y \ 2 J 1 



(29) 



{y^ + z'^){z-xY -2z{z-x){y-x){z-y) + 2{y-xY{z-yY 

m^-mo I = 

2{z — x){z — yY 

{z"^ — 2yz^ + x'^z'^ — 2x^yz) + by'^z'^ + Sx^y^ — Axy'^ + Axy'^z — Axyz"^ + 2y^ — Ay'^z 

2{z — x){z — yY 

and now, subtracting from fl26|) and simplifying by noting that the terms in the parenthesis 
in the numerators of these two expressions are equal, we find 

—4:y'^z'^ — 2x'^y'^ + Axy^ — 'ixy'^z + 4:xyz'^ — 2y^ + 4:y^z 

-mi 1 - mi -mo 1 = — -7 

2{z — x){z — y)'^ 

-yjy - x)iy^ + 2z^ - y{x + 2z)) 

{z-xYz-yY ■ ^ ^ 

The denominator in fl30|) is positive, as is —y and y — x. For the remaining term fl28|) yields 

y'^ + 2z^ - y{x + 2z) > z{z - y) > 0; 

hence fl5Pl) is positive, thus proving fl2S]) when F*{y) < Foi^y). 

When F^{y) > Fo{y) and therefore y'^ + z"^ < y{x + z) by ([28]), we have Fl{w) > Fo{w) 
for all w G [x, z] and hence 

P Z P Z P Z p z 

||m*-mo||i = / \Fl{w) - FQ{w)\dw = I {F*{w) - Fo{w))dw = / F*{w)dw - Fo{w)dw 

J X J X J X J X 

W — X , Z , 1 (z — xY 1 



2 

Now, since > 



dw — I dw = z = -{z — x) — z 

z — y 2 z — X 2 

+ (31) 



(x + z){x — z) = x"^ — z"^ < z'^ + x^ 
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noting that z — x > 0, dividing by 2{z — x) yields under the case at hand, by fl3Tl) and Lemma 



Ol that 



X + z z'^ + x"^ 



0111 2 -2{z-x) " ° ' 



thus proving inequahty ( l25l) when F^{y) > FQ{y), and therefore the lemma. H 
Theorem 3.1 

B{Ds) = 1. 

Proof: Let X G -D3 be arbitrary and suppose X is supported on the three points x < y < z. 
Lemma [1.21 shows that B{X) = 1 if X is supported on two points, so we may assume that 
X gives positive probability to x, y and z. We first prove 

B{X) < 1 when X G -D3 is positively supported on the nonzero points x,y,z. (32) 

That EX = implies x < < z. After proving ( l32i) we handle the remaining case where 
y = by a continuity argument. 

Let X be supported on x < y < z with y ^ 0. Lemma 11.31 with a = —1 implies 
B{—X) = B{X), so we may assume without loss of generality that x < y < < z. Let mi 
and niQ be the unique mean zero distributions supported on {x,z} and {y,z}, respectively, 
and let C{Xi) = mi and C{Xo) = m^. As generally every mean zero distribution having no 
atom at zero can be represented as a mixture of mean zero two point distributions (as in the 
Skorohod representation, see [2]), letting 

C{Xa) = ami + (1 — a)mQ, (33) 

we have C{X) = C{Xa) for some a G [0, 1]; in fact, in this particular case one may verify 
directly that P{X = x)/P{Xi = x) G (0, 1) and that fl33|) holds when a assumes this value. 
Therefore to prove ( 1321) it suffices to show 

B{Xa) < 1 for all a G [0, 1]. (34) 

By Lemma 11.21 

EX^ = -zx and EX^ = -zy (35) 

and by (!33|) the variance of X^ is given by 

EXl = aEXl + (1 - a)EXl = - [azx + (1 - a)zy) = -z{ax + (1 - a)y). (36) 

Applying Theorem 12.11 with S = {0, 1} and fi the probability measure putting mass a and 
1 — a on the points 1 and 0, respectively, in view of ( l35l) and (|36l) . m* , the Xa zero bias 
distribution is given by the mixture 

ax 

m* = I3m*i + (1 — l3)m*Q where j3 = — . (37) 

ax + [1 — a)y 

Since a; < y < we have 

j3 a X a 

-— > and therefore p > a. (38) 



P 1 — ay 1 — a 
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Let Fi,Fo,Fj* and Fq* denote the distribution functions of mi,mo,mJ and ml, respectively. 
Let f/ be a standard uniform variable and, with the inverse functions below given by f|T6|) . 
set 

{Y,,Yo,Y*,Y*) = iFr\U),F,~\U),iF*)-\U),{F*)-\U)). 

Then Yi =d Xi, Y* =d X* for i G {1, 2}, and by all pairs of the variables Yi, Yq, Y{, Y* 
achieve the distance between their respective distributions. Now, recalling fl38|) . let 
{Ya, Y*) be defined on the same space with joint distribution given by the mixture 

C{Y^, Y:) = aC{Y,, Y*) + (1 - /3)C{Yo, Y*) + (/5 - a)C{Yo, Y*). 

Then {Y^, Y*) has marginals Yq, =d X^ and Y* =dY*, hence by ([17]) 

ll^^a - "maWi < a\\ml - mi\\i + (1 - i3)\\ml - mo||i + {/S - a)\\ml - mo||i. (39) 

Lemma [L2] shows G{Xi) = 1, that is, that E\Xf\ = 2EXf\\m* - mi\\i for i = 1,2, so 
f l33|) gives 

E\Xl\ = 2{aEX^\\ml - rniWi + {1 - a)EX^\\m* - mo\\i), 
and now by ( l35l) . ( l36l) and ( 1371) we find 

ax||m* — mi||i + (1 — a)y\\mQ — mo||i 



aa; + (1 - a)y 



/?||mt-mi||i + (l-/?)||m*-mo||i.(40) 



Lemma [3TT] shows that the right hand side, and therefore the left hand side, of (l39l) is bounded 
by (jlO]), that is, that = 2EXl\\ml - ma\\i/E\Xl\ < 1, completing the proof of flMl) . 

and hence of ( |32l) . 

Lastly we consider the case where the mean zero random variable X is positively sup- 
ported on {x, 0, z} with x < < z and P{X = 0) = g G (0, 1). For n G N let 

F„ = X1(X 7^ 0) = 0) and X„ = F„ - 

As 71 — >• oo we see that y„ — >a.s. and = g/n ^ so that X„ — >a.s. and the 
bounded convergence theorem shows that satisfies the hypothesis of Lemma 12.21 

Hence B{Xn) — > B{X) as n — *• oo. For all G N such that 1/n < z the distribution of X„ 
is positively supported on the three distinct, nonzero points x — q/n < {1 — q)/n < z — q/n, 
so by fl5^ B{Xn) < 1 for all such n. Therefore the limit B{X) is also bounded by 1. ■ 

4 Lower Bound 

By (P with m = 1 and C{X) = G E J^i, 

\Fn - $||i < ^^^{5^1 for all riGN, 

'72 



and in particular for n = 1 

"^^^'ElX^l ^ E\X^\ 



e, > llfi-tll- = llg-t'l-. (41) 
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Motivated by Theorem 12.31 that two point distributions achieve the supreuma of B(G), 
for p e (0, 1) let 

where 5 is a Bernoulh variable with P{B = 1) = p = 1 — P{B = 0). The distribution 
function Gp of X is given by 



Gp{x) = < 



for X < — 4 /- 

q for — . / - < X < . / - 
Y g ~ Y 

1 ioT ./^ < X 



and therefore the distance between Gp and the standard normal is given by 

" ^{x)dx + / " \^{x) - q\dx + / \^{x)-l\dx. 



q VP 



As Gp G J^i for all p e (0, 1) and E\X^\ = {p^ + q^)lJpq, letting 



^(p) = ^^l|Gp-$||i for pG (0,1) 
inequality fHTl) gives Ci > ^/'(p) for all p G (0, 1), and '?/'(l/2) yields ([2]). 
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